pacman::p_load(jsonlite,dplyr,tidyr,stringr,lubridate,tidyverse,readtext,ggplot2,visNetwork,stringr,ggpubr, igraph, patchwork,igraph,ggraph,ggrepel)Take Home 3

Credit: John Blank Photography
1. Introduction
FishEye International monitors business records of commercial fishing operators in Oceanus to identify and prevent illegal fishing. Analysts work with company data, including ownership, shareholders, transactions, and products/services, to build the CatchNet Knowledge Graph.
Last year, SouthSeafood Express Corp was caught illegally fishing, leading to its closure. FishEye wants to understand the temporal patterns and infer how the fishing market reacted to this event, as some businesses may have tried to capture SouthSeafood’s market share, while others may have become more cautious about illegal activities.
FishEye aims to develop visualization tools for CatchNet to identify influential people in business networks, considering the varied and changing shareholder and ownership relationships.
2. Mini-Challenge 3: Temporal Analysis
2.1 Tasks and Questions:
Develop an approach using visual analytics to highlight temporal patterns and changes in corporate structures, focusing on identifying the most active individuals and businesses.
Utilize visualizations to display typical and atypical business transactions, such as mergers and acquisitions, and infer the underlying motivations behind changes in their activity levels.
Create a visual approach to examine inferences about how a company’s influence evolves over time and whether ownership or influence over a network can be deduced.
Identify and visualize the network of SouthSeafood Express Corp and competing businesses, and how they changed due to SouthSeafood’s illegal fishing behavior.
Determine which companies benefited from SouthSeafood’s legal troubles and if there are other suspicious transactions related to illegal fishing, providing visual evidence to support your conclusions.
Data Source: VAST Challenge 2024: Mini-Challenge 3
3. Data
The MC3 dataset is a comprehensive collection comprising 60,520 nodes (entities) and 75,817 edges (relationships or connections) organized into 4,782 distinct components. The nodes in this dataset represent various types of entities, including individuals (Person), chief executive officers (CEO), companies, and other organizational structures.
On the other hand, the edges capture different types of relationships or interactions between these nodes. Some examples of edge types include shareholdership (ownership of shares in a company), beneficial ownership (enjoying the benefits of owning a property or asset without being the legal owner), and potentially other forms of associations or transactions.
The dataset is structured such that the nodes key contains a list representing all the node entities, with each node carrying attributes or properties that describe its characteristics, such as ID, type, country, revenue, founding date, and potentially other relevant details.
Correspondingly, the links key holds a list that represents all the edges or connections between these nodes. Each edge entry typically includes properties like the edge type, start and end dates (if applicable), and identifiers for the source and target nodes involved in the relationship.
With this comprehensive dataset capturing both node entities and their interconnections through various types of edges, researchers and analysts can conduct in-depth analyses to uncover patterns, understand the dynamics of relationships, and gain insights into the complex network of entities and their interactions.
3.1 Data Preparations
- jsonlite: This package provides functionality for parsing and generating JSON data in R.
- dplyr: A part of the tidyverse, dplyr provides a consistent set of verbs for data manipulation, making it easier to transform and summarize data frames.
- tidyr: Another tidyverse package, tidyr helps in creating tidy data sets by providing functions for reshaping data frames.
- stringr: This package provides a cohesive set of functions for string manipulation and regular expressions in R.
- lubridate: lubridate is designed to make it easier to work with date-time data in R.
- tidyverse: The tidyverse is a collection of R packages designed for data science, including dplyr, ggplot2, tidyr, and others.
- readtext: readtext makes it easier to import and handle text data in R, particularly for text mining and analysis.
- ggplot2: Part of the tidyverse, ggplot2 is a powerful data visualization package for creating complex and publication-ready plots.
- visNetwork: visNetwork is a package for creating interactive network visualizations in R, using the vis.js library.
- ggpubr: ggpubr provides easy-to-use functions for creating publication-ready plots and combining multiple ggplot2 plots into a single figure.
- igraph: igraph is a collection of network analysis tools for creating and analyzing graphs and networks in R.
patchwork: patchwork is a package for composing multiple ggplot2 plots into a single figure, with easy layout control.
igraph: A package for creating and analysing graphs and networks.
ggraph: A package for creating graph-based data visualisations using the ‘ggplot2’ syntax.
ggrepel: A package for automatically adjusting text labels to avoid overlapping in ‘ggplot2’ visualisations.
Click to show code
Click to show code
mc3 <- fromJSON("data/mc3.json")To effectively perform temporal analysis on such data, it can be beneficial to separate the data set into two distinct data frames: one for the node entities and another for the relationships or edges between them.
Click to show code
# Load necessary libraries
library(dplyr)
library(tidyr)
library(stringr)
# Assuming mc3 is already loaded as a list containing nodes dataframe
mc3_nodes <- as_tibble(mc3$nodes) %>%
mutate(
id = as.character(id),
type = as.character(type),
country = as.character(country),
ProductServices = as.character(ProductServices),
revenue_omu = as.numeric(revenue),
head_of_org = as.character(HeadOfOrg),
TradeDescription = as.character(TradeDescription),
PointOfContact = as.character(PointOfContact),
id = gsub("^[0-9]+\\.\\s*", "", id), # Clean up IDs
revenue_omu = ifelse(is.na(revenue_omu), 0, revenue_omu)
) %>%
mutate(
type = ifelse(type == "Entity.Person", "Entity.Person.Person", type) # Rename Entity.Person
) %>%
rename(
last_edited_by = `_last_edited_by`,
last_edited_date = `_last_edited_date`,
date_added = `_date_added`,
raw_source = `_raw_source`,
algorithm = `_algorithm`
) %>%
select(everything())
# Display the first few rows of the updated dataframe
print(head(mc3_nodes))# A tibble: 6 × 17
type country ProductServices PointOfContact HeadOfOrg founding_date revenue
<chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 Entity… Uziland Unknown Rebecca Lewis Émilie-S… 1954-04-24T0… 5995.
2 Entity… Mawala… Furniture and … Michael Lopez Honoré L… 2009-06-12T0… 71767.
3 Entity… Uzifri… Food products Steven Robert… Jules La… 2029-12-15T0… 0
4 Entity… Islava… Unknown Anthony Wyatt Dr. Víct… 1972-02-16T0… 0
5 Entity… Oceanus Unknown Tracy Schmidt Jacques … 1954-04-06T0… 4747.
6 Entity… Imazam Fish, crustace… Corey Moore J… Thierry … 2031-09-30T0… 46567.
# ℹ 10 more variables: TradeDescription <chr>, last_edited_by <chr>,
# last_edited_date <chr>, date_added <chr>, raw_source <chr>,
# algorithm <chr>, id <chr>, dob <chr>, revenue_omu <dbl>, head_of_org <chr>
(mc3_nodes$type) %>% unique()[1] "Entity.Organization.Company"
[2] "Entity.Organization.LogisticsCompany"
[3] "Entity.Organization.FishingCompany"
[4] "Entity.Organization.FinancialCompany"
[5] "Entity.Organization.NewsCompany"
[6] "Entity.Organization.NGO"
[7] "Entity.Person.Person"
[8] "Entity.Person.CEO"
# Further processing to separate the 'type' column
mc3_nodes <- mc3_nodes %>%
separate(type, into = c("type_1", "type_2", "type_3"), sep = "\\.", fill = "right", extra = "drop")
# Display the first few rows of the updated dataframe
print(head(mc3_nodes))# A tibble: 6 × 19
type_1 type_2 type_3 country ProductServices PointOfContact HeadOfOrg
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Entity Organization Company Uziland Unknown Rebecca Lewis Émilie-S…
2 Entity Organization Company Mawalara Furniture and … Michael Lopez Honoré L…
3 Entity Organization Company Uzifrica Food products Steven Robert… Jules La…
4 Entity Organization Company Islavara… Unknown Anthony Wyatt Dr. Víct…
5 Entity Organization Company Oceanus Unknown Tracy Schmidt Jacques …
6 Entity Organization Company Imazam Fish, crustace… Corey Moore J… Thierry …
# ℹ 12 more variables: founding_date <chr>, revenue <dbl>,
# TradeDescription <chr>, last_edited_by <chr>, last_edited_date <chr>,
# date_added <chr>, raw_source <chr>, algorithm <chr>, id <chr>, dob <chr>,
# revenue_omu <dbl>, head_of_org <chr>
(mc3_nodes$type_2) %>% unique()[1] "Organization" "Person"
(mc3_nodes$type_3) %>% unique()[1] "Company" "LogisticsCompany" "FishingCompany" "FinancialCompany"
[5] "NewsCompany" "NGO" "Person" "CEO"
Click to show code
mc3_edges <- as_tibble(mc3$links) %>%
mutate(
type = as.character(type),
type_new = stringr::str_extract(type, "[^.]+$"),
source = as.character(source),
target = as.character(target),
last_edited_by = as.character(`_last_edited_by`),
last_edited_date = as.Date(`_last_edited_date`),
date_added = as.Date(`_date_added`),
raw_source = as.character(`_raw_source`),
algorithm = as.character(`_algorithm`)
) %>%
mutate(
start_date = as.Date(start_date),
end_date = if_else(is.na(end_date), as.Date(NA), as.Date(end_date)),
id = target
) %>%
# Remove `_last_edited_by`, `_last_edited_date`, `_date_added`, `_raw_source`, `_algorithm` if they exist
select(-c(`_last_edited_by`, `_last_edited_date`, `_date_added`, `_raw_source`, `_algorithm`)) %>%
rename(
last_edited_by = last_edited_by,
last_edited_date = last_edited_date,
date_added = date_added,
raw_source = raw_source,
algorithm = algorithm
) %>%
select(everything())Click to show code
mc3_edges <- mc3_edges %>%
mutate(
start_date = as.POSIXct(start_date),
last_edited_date = as.POSIXct(last_edited_date),
date_added = as.POSIXct(date_added),
end_date = if_else(is.na(end_date), as.POSIXct(NA), as.POSIXct(end_date)),
id = target
)
mc3_nodes <- mc3_nodes %>%
mutate(
last_edited_date = as.POSIXct(last_edited_date),
date_added = as.POSIXct(date_added),
founding_date = as.POSIXct(founding_date),
dob = as.POSIXct(dob)
)Click to show code
mc3_edges [duplicated(mc3_edges ),]# A tibble: 0 × 13
# ℹ 13 variables: start_date <dttm>, type <chr>, source <chr>, target <chr>,
# key <int>, end_date <dttm>, type_new <chr>, last_edited_by <chr>,
# last_edited_date <dttm>, date_added <dttm>, raw_source <chr>,
# algorithm <chr>, id <chr>
mc3_nodes [duplicated(mc3_nodes ),]# A tibble: 0 × 19
# ℹ 19 variables: type_1 <chr>, type_2 <chr>, type_3 <chr>, country <chr>,
# ProductServices <chr>, PointOfContact <chr>, HeadOfOrg <chr>,
# founding_date <dttm>, revenue <dbl>, TradeDescription <chr>,
# last_edited_by <chr>, last_edited_date <dttm>, date_added <dttm>,
# raw_source <chr>, algorithm <chr>, id <chr>, dob <dttm>, revenue_omu <dbl>,
# head_of_org <chr>
Click to show code
nodes <- mc3_nodes %>%
select(
type_2, type_3, id, dob, country,
head_of_org, revenue, last_edited_by,
last_edited_date, date_added, raw_source, algorithm
)
edges <- mc3_edges %>%
select(
type_new, id, start_date, end_date,
last_edited_by, last_edited_date, date_added,
raw_source, algorithm
)What are the Company Types?
# Count the unique values in type_2
type_2_unique_counts <- mc3_nodes %>%
group_by(type_2) %>%
summarise(count = n()) %>%
arrange(desc(count))
# Display the summary table
print(type_2_unique_counts)# A tibble: 2 × 2
type_2 count
<chr> <int>
1 Person 51649
2 Organization 8871
Extracting the top 10 id to see the different company types in edges and nodes file.
Click to show code
top_10_ids <- edges %>%
group_by(id) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
slice(1:10)Create a bar chart for nodes and edges.
Click to show code
barchart_type_counts_nodes <- mc3_nodes %>%
count(type_3, sort = TRUE)
barchart_type_counts_edges <- mc3_edges %>%
count(type_new, sort = TRUE)
b1 <- ggplot(barchart_type_counts_nodes, aes(x = reorder(type_3, -n), y = n)) +
geom_bar(stat = "identity", fill = "lightblue") +
labs(
title = "Distribution of Nodes Types",
x = "Nodes Type",
y = "Count"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 0, hjust = 0.8, size = 7)
)
b2 <- ggplot(barchart_type_counts_edges, aes(x = reorder(type_new, -n), y = n)) +
geom_bar(stat = "identity", fill = "lightblue") +
labs(
title = "Distribution of Edges Types",
x = "Edge Type",
y = "Count"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 0, hjust = 0.5, size = 7)
)combined_plot <- b1 / b2
print(combined_plot)
Nodes Type:
‘
Person,’ has count exceeding 50,000 and is the most common entities in the network. ‘Companies’ numbering around 5,000‘
CEO’ are present but much fewer, reflecting a small proportion of individuals with this specific role.Other organization types, such as ‘
FishingCompany,’ ‘LogisticsCompany,’ ‘FinancialCompany,’ ‘NewsCompany,’ and ‘NGO,’ are present in minimal numbers.‘
Shareholdership,’ has over 35,000 instances, indicating ownership stakes in companies as a primary form of interaction.‘
BeneficialOwnership’ is the second most common relationship, with around 20,000 instances. ‘Worksfor’ is slightly lesser than beneficialownership with around 15,000 instances.‘
FamilyRelationship’ connections are the least represented, suggesting that familial ties are not as prominent as business and ownership relations in the dataset.Top companies and the Types
Network Graph
Click to show code
library(dplyr)
library(igraph)
library(visNetwork)
# Assuming mc3_edges is already loaded as a dataframe
# Get the top 20 companies based on the count of edges
top_companies <- mc3_edges %>%
count(id, sort = TRUE) %>%
top_n(20, wt = n)
# Filter edges to include only those involving the top 20 companies
filtered_edges <- mc3_edges %>%
filter(id %in% top_companies$id) %>%
select(id, type_new)
# Prepare edges for graph creation
edges_for_graph <- filtered_edges %>%
rename(from = id, to = type_new)
# Create a bipartite graph
bipartite_graph <- graph_from_data_frame(d = edges_for_graph, directed = FALSE)
# Assign types for bipartite mapping
V(bipartite_graph)$type <- bipartite_mapping(bipartite_graph)$type
# Prepare nodes for visualization
nodes_vis <- data.frame(
id = V(bipartite_graph)$name,
label = V(bipartite_graph)$name,
group = ifelse(V(bipartite_graph)$type, "Type", "Company")
)
# Prepare edges for visualization
edges_vis <- igraph::as_data_frame(bipartite_graph, what = "edges")
# Create and customize the visNetwork graph
vis1 <- visNetwork(nodes_vis, edges_vis, width = "100%", height = "800px") %>%
visNodes(shape = "dot", scaling = list(label = list(enabled = TRUE))) %>%
visEdges(arrows = "none", color = list(color = "lightgray", highlight = "red")) %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
visLegend() %>%
visGroups(groupname = "Type", shape = "dot", color = "blue") %>%
visGroups(groupname = "Company", shape = "dot", color = "green") %>%
visLayout(randomSeed = 21) %>%
visInteraction(navigationButtons = TRUE) %>%
visPhysics(stabilization = FALSE, enabled = FALSE)# Display the graph
vis1The graph helps in understanding the complex web of interactions in the dataset, highlighting how entities are interconnected through various types of relationships. Each node represents an entity, such as a company or individual, and the edges depict the relationships between them, including shareholdership, beneficial ownership, and employment links. For example,
Mosley and Sons has connection to shareholdership, worksfor and beneficial ownership while Nguyen, Shelton and Hayes only has connection to shareholdership and beneficial ownership.Companies and amount of types
Click to show code
top_10_ids <- mc3_edges %>%
group_by(id) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
slice(1:10)
filtered_edges <- mc3_edges %>%
filter(id %in% top_10_ids$id)
type_new_counts <- filtered_edges %>%
group_by(id, type_new) %>%
summarise(count = n(), .groups = "drop") %>%
arrange(desc(count))
type_new_colors <- c("Shareholdership" = "pink",
"BeneficialOwnership" = "lightblue",
"WorksFor" = "lightgreen")
wrap_text <- function(text, width) {
wrapped_text <- str_replace_all(text, "-", " ")
wrapped_text <- str_wrap(wrapped_text, width = width)
wrapped_text <- str_replace_all(wrapped_text, " ", "-")
return(wrapped_text)
}
type_new_counts <- type_new_counts %>%
mutate(id_wrapped = wrap_text(id, 10))
type_new_counts <- type_new_counts %>%
mutate(id_wrapped = factor(id_wrapped, levels = type_new_counts %>%
group_by(id_wrapped) %>%
summarize(total_count = sum(count)) %>%
arrange(desc(total_count)) %>%
pull(id_wrapped)))
b4 <- ggplot(type_new_counts, aes(x = id_wrapped, y = count, fill = type_new)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.7) +
geom_text(aes(label = count), position = position_dodge(width = 0.9), vjust = 0.8, size = 3) + # Add text labels
labs(x = " ", y = " ", title = "Top Names with Multiple Type Relationships") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 0, hjust = 0.5, size = 8),
legend.position = "bottom"
) +
scale_fill_manual(values = type_new_colors, name = "Type")print(b4)
This bar chart visualises the top names with multiple types of relationships in the network. The x-axis lists the top entities, while the y-axis represents the count of different relationship types, illustrating how some organisations or individuals are central to multiple types of interactions within the network.
Downs Group and Mills, Atkinson and Chavez have the highest number of relationships, indicating their significant roles in various types of network interactions.Which Companies has the highest Revenue?
Click to show code
library(plotly)
company_revenue <- mc3_nodes %>%
group_by(id, type_3) %>%
summarise(total_revenue = sum(revenue_omu, na.rm = TRUE), .groups = 'drop')
plot <- plot_ly(data = company_revenue,
x = ~id,
y = ~total_revenue,
text = ~paste("Name:", id,
"<br>Total Revenue:", total_revenue,
"<br>Type:", type_3),
type = "scatter",
mode = "markers",
marker = list(size = ~total_revenue / 1000000,
sizemode = 'area',
sizeref = 0.1,
color = ~total_revenue,
colorscale = "Viridis",
showscale = TRUE))
plot <- plot %>%
layout(title = "Interactive Bubble Chart of Company Revenue",
xaxis = list(title = "", showticklabels = FALSE), # Hide x-axis labels
yaxis = list(title = "Total Revenue"),
hovermode = "closest")plotThis bubble chart visualises the distribution of total revenue for different entities in the network. Each bubble represents an entity, with the size of the bubble corresponding to the total revenue and the colour indicating the magnitude of the revenue. The largest yellow bubble represents
Briggs-Wilson, a company with a total revenue of approximately 310.6 million.Irregular Patterns by Revenue
Extracting year from full date as using the full date format is too detail for analysis.
edges <- edges %>%
mutate(
Start_Year = year(start_date),
End_Year = year(end_date)
)
nodes <- nodes %>%
mutate(added_year = year(date_added))The id with top 8 revenues had been extracted and visualise using heat map.
Click to show code
edges <- edges %>%
mutate(
Start_Year = year(start_date),
End_Year = year(end_date)
)
nodes <- nodes %>%
mutate(added_year = year(date_added))
top_8_revenue_nodes <- nodes %>%
arrange(desc(revenue)) %>%
slice_head(n = 8)
MC2_node_abnor <- edges %>%
filter(id %in% top_8_revenue_nodes$id) %>%
group_by(id, Start_Year) %>%
summarise(weight = n(), .groups = "drop")
MC2_node_abno <- edges %>%
filter(id %in% top_8_revenue_nodes$id) %>%
group_by(id, Start_Year) %>%
summarise(weight = n(), .groups = "drop")
g1 <- ggplot(MC2_node_abnor, aes(Start_Year, id)) +
geom_tile(aes(fill = weight)) +
geom_text(aes(label = weight), size = 3) +
labs(title = "Irregular Pattern by Revenue") +
scale_fill_gradient(low = "white", high = "lightblue") +
theme(axis.text.x = element_text(angle = 0, hjust = 1, size = 8)) +
theme(axis.text.y = element_text(size = 6))print(g1)
This heat map visualises the irregular patterns in revenue for the top 8 revenue companies over time. Each cell represents a specific entity and year, with the numbers indicating the weight or frequency of revenue changes. More details of transaction and relationship examination should be done to check any atypical business activities.
-
Williams, Stanley and Butlet had a unusually high activity count of 9 in 2025.-
Yang PLC had peaks in 2012, 2021 and 2022.-
Short, Hernandez and Myers had irregular activity peaks at 2011 and 2020.-
Roth Ltd had isolated peaks in 2010 and 2025.-
Pearson-Williams had high activity during 2015 to 2017 with a drop off after.-
Dunlap, Fleming and Brown had significant peak in 2025 but low activity overall.-
Cooper, Jacobs and Gonzalez had high activities from 2018 to 2021.-
Briggs-Wilson had peaks in 2010 and 2028 but low activity overall.Network Visualization with Node and Edge Groups
This graph visualises the relationships between various entities, highlighting different types of organisations and their connections.
Click to show code
mc3_nodes2 <- mc3_nodes %>%
mutate(id = as.character(id),
type_3 = as.character(type_3))
mc3_edges2 <- mc3_edges %>%
mutate(source = as.character(source),
target = as.character(target),
type_new = as.character(type_new))
edges2 <- mc3_edges2 %>%
select(from = source, to = target, type_new)
all_nodes <- unique(c(edges2$from, edges2$to))
nodes2 <- mc3_nodes2 %>%
filter(id %in% all_nodes) %>%
select(id, type_3) %>%
distinct() %>%
rename(name = id, group = type_3)
missing_nodes <- setdiff(all_nodes, nodes2$name)
if (length(missing_nodes) > 0) {
additional_nodes <- data.frame(name = missing_nodes, group = "Unknown")
nodes2 <- bind_rows(nodes2, additional_nodes)
}
nodes2 <- nodes2 %>%
distinct(name, .keep_all = TRUE)
graph <- graph_from_data_frame(d = edges2, vertices = nodes2, directed = TRUE)
nodes_vis <- data.frame(id = V(graph)$name, label = V(graph)$name, group = V(graph)$group)
edges_vis <- igraph::as_data_frame(graph, what = "edges")
vis1 <- visNetwork(
nodes_vis,
edges_vis,
width = "100%",
main = list(
text = "Network Visualization with Node and Edge Groups",
style = "font-size:17px; font-weight:bold; text-align:right;"
)
) %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visGroups(groupname = "Unknown", color = "#D3D3D3") %>%
visGroups(groupname = "Company", color = "#1696d2") %>%
visLegend() %>%
visEdges(arrows = "to", color = list(color = "lightgray", highlight = "red")) %>%
visOptions(
highlightNearest = list(enabled = TRUE, degree = 2, hover = TRUE),
nodesIdSelection = TRUE,
selectedBy = "group",
collapse = TRUE
) %>%
visInteraction(navigationButtons = TRUE) %>%
visPhysics(stabilization = FALSE, enabled = FALSE)
unique_types <- unique(edges2$type_new)
for (t in unique_types) {
vis1 <- vis1 %>%
visGroups(groupname = t, color = scales::hue_pal()(length(unique_types))[which(unique_types == t)])
}vis1Click to show code
filtered_edges <- edges %>%
filter(!is.na(start_date) & !is.na(end_date))
filtered_edges <- filtered_edges %>%
mutate(year = year(start_date))
top_entities <- filtered_edges %>%
group_by(type_new) %>%
count(id) %>%
top_n(10, wt = n) %>%
ungroup() %>%
pull(id)
filtered_edges_top <- filtered_edges %>%
filter(id %in% top_entities)
nodes <- filtered_edges_top %>%
select(id, type_new) %>%
distinct() %>%
rename(name = id)
all_nodes <- unique(c(filtered_edges_top$id, filtered_edges_top$type_new))
missing_nodes <- setdiff(all_nodes, nodes$name)
if (length(missing_nodes) > 0) {
additional_nodes <- data.frame(name = missing_nodes, type_new = NA)
nodes <- bind_rows(nodes, additional_nodes)
}
nodes <- nodes %>%
distinct(name, .keep_all = TRUE)
node_shapes <- c(
"Fishing-related Companies" = 22,
"Unknown" = 21,
"Non-fishing Companies" = 23,
"Beneficial Owner" = 24,
"Ultimate Beneficial Owner" = 25,
"Multi-role Entity" = 20
)
node_colors <- c(
"Fishing-related Companies" = "blue",
"Unknown" = "grey",
"Non-fishing Companies" = "green",
"Beneficial Owner" = "yellow",
"Ultimate Beneficial Owner" = "red",
"Multi-role Entity" = "purple"
)
graph <- graph_from_data_frame(d = filtered_edges_top, vertices = nodes, directed = TRUE)
V(graph)$degree <- degree(graph, mode = "all")
layout <- create_layout(graph, layout = "fr")
g <- ggraph(layout) +
geom_edge_link(aes(color = factor(year)), alpha = 0.5, edge_width = 0.8) +
geom_node_point(aes(x = x, y = y, size = degree, shape = type_new, fill = type_new), color = "black", stroke = 0.5) +
scale_shape_manual(values = node_shapes) +
scale_fill_manual(values = node_colors) +
scale_color_manual(values = rainbow(length(unique(filtered_edges_top$year)))) + # Assign colors to years
geom_text_repel(aes(x = x, y = y, label = name), size = 2.5) +
theme_void() +
theme(legend.position = "right") +
labs(title = "Network Visualization of Top 30 Entities by Type and Year",
color = "Year",
fill = "Type")print(g)
This network visualisation showcases the top 30 entities, highlighting their connections and interactions from 2005 to 2034. Larger nodes signify entities with more connections, illustrating their influence within the network. The separate cluster shows that
workfor is separated from shareholdership and beneficialownership.Activity Counts per Year with Outliers Highlighted
Click to show code
filtered_edges <- edges %>%
filter(!is.na(start_date) & !is.na(end_date))
filtered_edges <- filtered_edges %>%
mutate(year = year(start_date))
activity_counts <- filtered_edges %>%
group_by(year, id, type_new) %>%
summarise(activity_count = n()) %>%
ungroup()
activity_counts <- activity_counts %>%
group_by(type_new) %>%
mutate(
Q1 = quantile(activity_count, 0.25),
Q3 = quantile(activity_count, 0.75),
IQR = Q3 - Q1,
lower_bound = Q1 - 1.5 * IQR,
upper_bound = Q3 + 1.5 * IQR,
is_outlier = activity_count < lower_bound | activity_count > upper_bound
) %>%
ungroup()
o1 <- ggplot(activity_counts, aes(x = year, y = activity_count, color = is_outlier)) +
geom_point() +
geom_text_repel(aes(label = ifelse(is_outlier, id, "")), size = 2.5) +
facet_wrap(~ type_new) +
labs(title = "Activity Counts per Year with Outliers Highlighted",
x = "Year", y = "Activity Count", color = "Outlier") +
theme_minimal()print(o1)
Click to show code
filtered_edges <- edges %>%
filter(!is.na(start_date) & !is.na(end_date))
filtered_edges <- filtered_edges %>%
mutate(year = year(start_date))
activity_counts <- filtered_edges %>%
group_by(year, id, type_new) %>%
summarise(activity_count = n()) %>%
ungroup()
activity_counts <- activity_counts %>%
group_by(type_new) %>%
mutate(
Q1 = quantile(activity_count, 0.25),
Q3 = quantile(activity_count, 0.75),
IQR = Q3 - Q1,
lower_bound = Q1 - 1.5 * IQR,
upper_bound = Q3 + 1.5 * IQR,
is_outlier = activity_count < lower_bound | activity_count > upper_bound
) %>%
ungroup()
ggplot(activity_counts, aes(x = year, y = activity_count, color = is_outlier)) +
geom_point() +
geom_text_repel(aes(label = ifelse(is_outlier, id, "")), size = 2.5) +
facet_wrap(~ type_new, scales = "free_y") + # Using free_y scale for better visualization
labs(title = "Activity Counts per Year with Outliers Highlighted",
x = "Year", y = "Activity Count", color = "Outlier") +
scale_color_manual(values = c("FALSE" = "blue", "TRUE" = "red")) + # Color coding outliers
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(size = 14, face = "bold")
)
The scatter plot displays activity counts per year, categorised by BeneficialOwnership, Shareholdership, and WorksFor relationships. The y-axis represents the activity count, while the x-axis shows the year. Outliers, highlighted in blue, indicate entities with unusually high activity levels.
Stephens-Lopez stands out with consistent high activity counts in the BeneficialOwnership category around 2030. Similarly, Duncan-Williams shows significant activity in the early 2020s in the same category. In Shareholdership, Haas-Tran and Blair PLC display peaks around 2030 and 2025, respectively.outliers_summary <- activity_counts %>%
filter(is_outlier) %>%
select(year, id, type_new, activity_count) %>%
arrange(type_new, year, desc(activity_count))print(outliers_summary)# A tibble: 31 × 4
year id type_new activity_count
<dbl> <chr> <chr> <int>
1 2014 Wright LLC BeneficialOwnership 2
2 2015 Wright LLC BeneficialOwnership 2
3 2017 Nichols-Esparza BeneficialOwnership 2
4 2022 Wright LLC BeneficialOwnership 2
5 2023 Nichols-Esparza BeneficialOwnership 2
6 2027 Smith-Ramirez BeneficialOwnership 4
7 2029 Smith-Ramirez BeneficialOwnership 2
8 2030 Smith-Ramirez BeneficialOwnership 2
9 2031 Smith-Ramirez BeneficialOwnership 5
10 2032 Stephens-Lopez BeneficialOwnership 7
# ℹ 21 more rows
ggplot(outliers_summary, aes(x = year, y = id, fill = activity_count)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "red") +
facet_wrap(~ type_new, scales = "free_y") +
labs(title = "Heatmap of Outliers in Activity Counts",
x = "Year", y = "Entity", fill = "Activity Count") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(size = 14, face = "bold")
)
This heatmap illustrates outliers in activity counts for BeneficialOwnership and Shareholdership relationships. Entities like
Wright LLC and Stephens-Lopez exhibit significant outliers in BeneficialOwnership, while Woods Inc and Weaver-Price show unusual activity in Shareholdership. The activity counts have increased since late 2020, indicating abnormal patterns.